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INTRODUCTION 


The  property  of  randomness,  or  random  behavior,  is  an  essential  element  in  many  areas  of 
scientific  research  and  application.  For  example,  the  presence  of  random  behavior  is  a  key  re¬ 
quirement  in  digital  computer  simulation  studies.  Such  a  simulation  requires  a  mechanism  for 
generating  sequences  of  events  in  which  each  sequence  obeys  a  specific  probability  law.  The 
probability  law  most  frequently  encountered  in  simulation  work  assumes  that  events  in  the 
sequence  are  independent  and  identically  distributed;  for  example,  each  event  in  the  sequence 
might  be  assumed  to  follow  a  normal  distribution  with  the  same  mean  and  variance  while  each 
event  occurs  purely  by  chance  and  is  not  related  to  the  occurrence  of  any  other  event  in  the 
sequence. 

It  is  desirable  to  have  a  generating  mechanism  that  can  produce  random  variates  from 
many  different  probability  distributions.  Probability  theory  establishes  the  fact  that  variates 
can  be  generated  from  a  wide  variety  of  distributions,  provided  that  a  sequence  of  independent 
uniform  random  variates  on  the  interval  (0,1)  can  be  generated.  In  a  uniform  (0,1)  distri¬ 
bution,  each  possible  number  in  the  range  zero  to  one  is  equally  likely  to  occur.  Thus,  the 
need  for  an  efficient  algorithm  for  generating  uniform  variates  cannot  be  underestimated. 

Computer  algorithms  for  generating  random  numbers  produce  sequences  that  are  de¬ 
terministic;  i.e.,  each  number  is  completely  determined  by  its  predecessor  and,  therefore,  all 
numbers  in  the  sequence  are  determined  by  the  starting  number.  While  such  sequences  are  not 
truly  random,  they  appear  to  be  so.  Since  the  actual  relationship  between  one  number  and  its 
successor  has  no  physical  significance  in  most  applications,  this  nonrandom  character  is  not 
really  undesirable.  Sequences  of  numbers  generated  deterministically  are  referred  to  as  pseudo¬ 
random. 

This  report  is  concerned  with  recommended  statistical  methodology  for  evaluating  candi¬ 
date  pseudo-uniform  random  number  generators;  specifically,  those  that  produce  sequences 
of  real  numbers  U0,  U} ,  U2  ,  ...  that  behave  as  though  each  number  is  independently  selected 
at  random  from  the  uniform  distribution  with  range  zero  to  one.  The  symbol  U(0,1)  will 
be  used  to  denote  a  continuous  uniform  random  variable  that  takes  on  values  between  zero  and 
one. 


There  are  several  ways  to  produce  a  sequence  of  numbers  on  a  digital  computer  that 
looks  like  a  sequence  of  U(0,1)  random  numbers.  The  most  widely  used  method  involves 
generating  a  sequence  of  integers  XQ,  Xj ,  X2,  ...,  Xn  by  means  of  a  linear  congruential  generator 
of  the  form 

Xn+  j  =  (aXn  +  c)  mod  m ,  n  >  0 
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where  XQ  >  0  represents  the  starting  value,  a  >  0  is  referred  to  as  the  multiplier,  c  >  0  is 
called  the  increment,  and  m  is  the  modulus  (m  >  X0 ,  m  >  a,  m  >  c).  A  corresponding  sequence 
of  real  numbers  is  then  formed  via  the  relationship  Un  =  Xn/m. 

Ultimately,  all  congruential  sequences  produce  a  cycle  of  numbers  that  is  repeated  * 

endlessly.  This  repeating  cycle  is  referred  to  as  the  period  of  the  sequence,  and  the  length  of 
the  period  can  never  exceed  m.  Since  such  sequences  should  have  relatively  long  periods  in 
order  to  be  useful,  the  numbers  XQ,  a,  c,  and  m  must  be  properly  chosen.  The  terms  multi¬ 
plicative  congruential  generator  and  mixed  congruential  generator  are  commonly  used  to  refer 
to  linear  congruential  generators  with  c  =  0  and  c  #  0,  respectively.  For  a  detailed  discussion 
of  the  construction  of  good  linear  congruential  generators  as  well  as  a  description  of  other 
methods  for  generating  U(0,1)  random  numbers  on  digital  computers  see  Knuth  (1969). 

Thus,  any  candidate  uniform  random  number  generator  should  be  carefully  examined  to 
ensure  that  the  numbers  it  produces  are  adequate  for  the  desired  experimental  purposes.  Of 
prime  importance  is  that  the  candidate  generator  pass  a  collection  of  statistical  tests  designed 
to  expose  departures  from  independence  and  uniformity.  The  failure  of  a  generator  to  possess 
these  properties  can  produce  severely  misleading  results  in  simulation  studies.  Fishman  (1973) 
points  out  that  it  is  also  desirable  for  the  candidate  generator  to  be  dense  and  efficient:  a  dense 
generator  contains  enough  digits  so  that  there  are  no  wide  gaps  between  assumable  values 
on  the  unit  interval;  an  efficient  generator  produces  random  numbers  quickly  and  utilizes 
minimal  storage  in  the  computer.  It  should  be  emphasized  that  random  number  generators 
cannot  be  adequately  evaluated  in  theory.  Instead,  one  must  generate  a  set  of  pseudo-random 
numbers  from  the  candidate  generator  and  perform  statistical  tests  on  them. 

In  the  near  future,  NSWC  will  begin  installing  a  new  general-purpose  computer  system, 
which  will  include  a  machine-dependent  pseudo-uniform  random  number  generator.  Since  the 
use  of  random  number  generators  is  widespread  among  scientists  and  researchers  in  simulation 
and  analysis  studies  at  NSWC,  the  Mathematical  Statistics  Staff  (K 1 06)  felt  that  it  was  important 
to  develop  a  computer  program  that  could  be  used  to  subject  this  generator  to  a  battery  of 
statistical  “tests  of  randomness.”  The  results  of  these  tests  could  then  be  used  to  judge  the 
adequacy  of  the  generator  for  producing  sequences  of  ranuom  variates  that  give  the  appearance 
of  coming  from  the  U(0,1)  distribution.  In  addition,  this  program  could  also  be  used  to  test 
the  adequacy  of  other  candidate  pseudo-uniform  random  number  generators  designed  for  use 
on  this  or  any  other  computer  system.  The  identification  of  a  “bad”  generator  would  result 
in  its  being  rejected  for  use.  One  or  more  new  generators  would  then  be  constructed  and 
similarly  tested  until  a  “good”  generator  was  found.  Clearly,  the  early  detection  of  “bad” 
pseudo-uniform  random  number  generators  is  highly  desirable. 

The  program  written  to  meet  the  above  requirements,  RANDOM,  is  programmed 
in  FORTRAN  IV  for  the  CDC  6700  computer  system  at  NSWC.  RANDOM  performs  1 1 
different  statistical  tests  of  randomness  on  a  single  sequence  of  10,000  pseudo-uniform  random 
numbers  produced  by  the  candidate  generator  (Appendices  A  and  B  present  an  input  guide  and 
sample  output,  respectively,  for  RANDOM).  These  tests  are  referred  to  as  statistical  tests  of 
hypothesis  and  are  designed  to  reveal  departures  from  randomness.  A  statistical  hypothesis 
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is  a  statement  to  be  tested  that  is  either  accepted  or  rejected  at  a  prescribed  level  of  signi¬ 
ficance,  which  was  chosen  to  be  five  percent  for  each  of  these  tests. 

For  an  elementary  description  of  the  theory  of  statistical  hypothesis  testing,  the  reader  ' 

is  referred  to  Walpole  and  Myers  (1978).  The  number  of  empirical  tests  of  randomness  chosen 
for  inclusion  in  RANDOM  is  by  no  means  exhaustive.  Rather,  an  attempt  was  made  to  include 
those  tests  that  have  proven  most  useful  in  characterizing  randomness.  These  11  tests  are 
described  in  detail  in  the  next  section. 

Much  has  appeared  in  the  literature  concerning  random  number  generators.  The  interested 
reader  is  referred  to  Hull  and  Dobell  (1962),  Jansson  (1966),  and  MacLaren  and  Marsaglia 
(1965),  in  addition  to  the  previously  mentioned  sources. 


TESTS  PERFORMED  IN  PROGRAM  RANDOM 


This  section  is  devoted  to  a  detailed  description  of  each  of  the  11  statistical  tests  of 
hypothesis  employed  in  program  RANDOM.  The  order  of  discussion  here  is  identical  to  the 
order  in  which  these  tests  are  executed  in  RANDOM.  Each  test  is  applied  to  the  same  input 
sequence  of  10,000  real  numbers  UQ,  Uj ,  U2,  ...  ,  U9999>  which  purports  to  be  uniformly 
distributed  between  zero  and  one.  For  additional  details  and  historical  remarks  on  most  of 
these  tests  see  Knuth  ( 1 9691. 


MEAN  AND  VARIANCE  TESTS 

This  test  (actually  two  separate  tests  that  “belong”  together)  is  more  appropriately  classified 
as  a  test  on  moments  of  a  distribution  vice  a  test  on  randomness.  The  U(0,1)  distribution  has 
a  mean  of  0.5  and  a  variance  of  1/12  (equivalently,  a  standard  deviation  of  0.2887).  The  test 
on  the  mean  is  designed  to  determine  whether  or  not  the  sample  average  of  the  10,000  pseudo- 
uniform  variates  properly  approximates  the  hypothesized  mean  of  0.5. 

The  sample  mean  is  approximately  distributed  as  a  normal  random  variable  with  mean 
0.5  and  variance  (1/1 2)/10,000  =  8.33  X  10"6.  Hence,  for  a  test  at  the  five-percent  level  of 
significance,  the  hypothesis  that  the  true  mean  is  0.5  is  rejected  if  the  sample  mean  lies  outside 
the  interval  0.5  ±  1.96  X  (8.33  X  10'6)*,  or  (0.4943,  0.5057). 

Similarly,  the  test  on  the  variance  (actually  on  the  standard  deviation)  is  designed  to 
determine  whether  or  not  the  sample  standard  deviation  properly  approximates  the  hypothesized 
standard  deviation  of  0.2887.  Hald  (1952)  shows  that  the  sample  standard  deviation  is 
approximately  normally  distributed  with  mean  0.2887  and  variance  (1/12)/20,000 
=  4.166  X  1 0~ 6  in  this  case.  Thus,  at  the  five-percent  level  of  significance,  the  hypothesis 
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that  the  true  standard  deviation  is  0.2887  is  rejected  if  the  sample  standard  deviation  lies 
outside  the  interval  0.2887  ±  1.96  X  (4.166  X  lO-6)*,  or  (0.2847,  0.2927). 


FREQUENCY  TEST 


To  determine  whether  or  not  the  input  sequence  of  N  =  10,000  real  numbers  consists  of 
numbers  that  are,  in  fact,  uniformly  distributed  between  zero  and  one,  divide  the  theoretical 
range  of  these  numbers  into  100  categories  each  of  length  0.01.  Let 


Pj  =  probability  that  an  observation  falls  into  category  i 

Ej  =  expected  number  of  observations  in  category  i 

0;  =  number  of  observations  that  actually  fall  into  category  i 

In  this  case,  p;  =  0.01  and  E;  =  Np.  =  100  for  all  i.  Then  the  statistic 

too  (O.  -  E.)2  too  (O.  -  100)2 
X2=£  -1 - i-  =  £  * 


i=i  Ei 


i=i 


100 


is  approximately  distributed  as  a  chi-square  random  variable  with  99  degrees  of  freedom. 
It  is  this  statistic  upon  which  the  frequency  (or  equidistribution)  test  is  based. 

Note  that  the  statistic  X2  is  always  either  positive  or  zero.  If  the  observed  frequencies 
are  close  to  the  expected  frequencies,  the  value  of  X2  will  be  small,  while  observed  frequencies 
that  are  not  close  to  the  expected  frequencies  will  produce  large  values  of  X2 .  Small  values  of 
X2  support  the  hypothesis  of  uniformity,  while  large  values  of  X2  lead  to  rejection  of  the 
hypothesis.  At  the  five-percent  level  of  significance,  the  critical  value  for  the  test  is  found 
to  be  123.2253.  Thus,  if  the  computed  value  of  X2  equals  or  exceeds  this  value,  the  hypothesis 
that  the  input  sequence  contains  numbers  that  are  uniformly  distributed  between  zero  and  one 
is  rejected. 

In  addition  to  performing  the  above  test,  program  RANDOM  displays  a  frequency  table 
for  the  100  categories  and  produces  a  graph  of  the  cumulative  frequency  distribution,  both 
based  on  the  input  sequence  of  10,000  numbers. 


KOLMOGOROV-SM1RNOV  (K-S)  TEST 

Let  X  be  a  continuous  random  variable.  Then,  the  cumulative  distribution  function  (c.d.f.) 
of  X,  denoted  by  F(x),  is  defined  by 
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F(x)  =  P(X  <  x) 


for  all  x. 


Given  a  sample  of  N  independent  observations  of  X,  the  empirical  c.d.f.  FN(x)  is  defined  as 


Fn(x)  = 


number  of  observations  <  x 
N 


Fn(x)  will  generally  differ  from  F(x).  However,  if  FN(x)  differs  from  the  assumed  c.d.f,  F(x) 
by  too  large  a  margin,  one  would  have  reasonable  grounds  for  rejecting  the  hypothesis  that 
F(x)  is,  in  fact,  the  correct  population  c.d.f.  This  reasoning  is  the  basis  for  the  K-S  test. 

The  test  statistic  is  based  upon  the  maximum  absolute  deviation  between  the  ordinates  of 
the  empirical  c.d.f.  and  the  hypothesized  c.d.f.  at  common  abscissa  values  and  is  given  by 

Dn  =  max  |  Fn(x)  -  F(x)  |  . 

If  the  value  of  DN  is  too  large  (i.e.,  if  it  exceeds  a  chosen  critical  value),  the  hypothesis  that 
the  assumed  F(x)  is  the  true  F(x)  is  rejected. 


Of  concern  is  the  comparison  of  a  sample  c.d.f.  based  on  N  =  10,000  observations  with 
an  assumed  U(0,1)  c.d.f.  The  comparison  is  to  be  made  using  10,000  different  abscissa  values. 
Using  the  same  symbols  defined  for  the  frequency  test  with  p;  =  0.0001  and  E{  =  Np;  =  1  for 
all  i,  the  appropriate  K-S  test  statistic  becomes 


D 


N 


max 

i 


i 


O.  -  (i  x  E.) 
j=i  J  1 


i'  1,2 . N 


where  N  =  10,000.  According  to  Harter  (1980),  the  critical  value  for  this  test  at  the  five- 
percent  level  of  significance  is  0.01356.  Hence,  if  the  computed  value  of  D]0  000  equals 
or  exceeds  this  value,  the  hypothesis  that  the  input  sequence  represents  a  sequence  of  random 
variates  drawn  from  a  U(0,1)  distribution  is  rejected. 


MAXIMUM  OF  t  TEST 

Consider  the  input  sequence  of  N  real  numbers  UQ,  Uj ,  ...  ,  UN1  assumed  to  have  come 
from  a  U(0,1)  distribution.  Define 

Vj  “  max  l^tj'  ^tj+i’  "•  ’ 
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for  0  <  j  <  n  where  N  =  nt.  It  is  easy  to  see  that  V.  has  c.d.f.  given  by 

F(x)  =  x‘,  0  <  x  <  1  (1) 


since 


P  [max  (U,,  U2>  ...  ,  Ut)  <  x] 

=  P[U,  <  x,  U2  <  x, ...  ,  Ut  <  x] 

=  X  •  X  .....  X  =  X1. 

The  K-S  test  can  then  be  applied  to  the  sequence  V0>  Vp ...  ,  Vn  l  by  evaluating 

Dn  =  max  |  Fn(x)  -  F(x)  |  . 

If  Dn  exceeds  the  prechosen  critical  value,  the  hypothesis  that  the  true  c.d.f.  is  given  by 
Equation  1  is  rejected.  This,  in  turn,  implies  that  the  input  sequence  UQ,  Up  ,  UN]  does 
not  represent  a  sequence  of  random  variates  drawn  from  a  U(0,1)  distribution. 

To  apply  the  maximum  of  t  test  to  the  input  sequence  of  N  =  10,000  observations, 
n  =  100  and  t  =  100  were  selected  so  that  Equation  1  becomes 

F(x)  =  x100,  0  <  x  <  1  . 

Next,  the  V’s  are  ordered  from  smallest  to  largest;  i.e.. 


where  <  Vj  <  ...  <  V'99.  The  equation 

x100=c,  0<c<l 


is  solved  for  x,  yielding 


x  =  c1/1 00 


(2) 


Equation  2  is  then  evaluated  at  n  =  100  values  of  c;  i.e.,  c  =  0.01,  0.02,  ...  ,  1.00.  For  example, 

for  c  =  0.01,  x  as  0.9550.  Then,  for  each  x  the  sequence  V^,  V' . V'99  is  used  to  determine 

the  value  of  the  sample  c.d.f. 


FiooW  ~ 


number  of  V'  values  <  x 
100 
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These  values  are  then  used  in  computing  the  K-S  test  statistic 

D,00  =  max  i  Fj 0Q(x)  -  F(x) |  . 

The  critical  region  for  a  test  at  the  five-percent  level  of  significance  is  D100  >  0.13403 
(Owen,  1962).  Hence,  F]00(x)  represents  the  sample  c.d.f.  formed  by  the  maximum  values 
from  consecutive  blocks  of  t  =  100  numbers  and  D]00  is  the  maximum  absolute  deviation 
between  this  sample  c.d.f.  and  the  corresponding  theoretical  c.d.f.  Therefore,  the  hypothesis 
to  be  tested  is  that  this  maximum  deviation  is  not  significantly  different  from  the  maximum 
deviation  obtained  had  the  input  sequence  come  from  a  U(0,1)  distribution.  If  D100  exceeds 
the  above  critical  value,  this  hypothesis  is  rejected. 


GAP  TEST 

Given  the  input  sequence  of  N  =  10,000  real  numbers  UQ,  Uj ,  ...  ,  U9999,  this  test 
examines  the  lengths  of  “gaps”  between  occurrences  of  U  in  some  prespecified  range.  A  chi- 
square  test  statistic,  akin  to  the  one  used  in  the  frequency  test,  is  then  used  to  determine 
whether  or  not  the  observed  numbers  of  gaps  of  each  length  are  sufficiently  close  to  their 
corresponding  expected  numbers,  which  would  support  the  hypothesis  that  the  input  sequence 
represents  a  sequence  of  random  variates  coming  from  a  U(0, 1 )  distribution. 

Choose  two  real  numbers  a  and  0  such  that  0  <  a  <3  <  1.  Consider  the  input  sequence 
U0,  Uj,  ...  ,  UN1  to  be  a  cyclic  sequence  with  UN+j  identified  with  If.  Next  consider  the 
lengths  of  consecutive  subsequences  Lf,  Uj+ , ,  ...  ,  IL+J.  in  which  Uj+r  lies  between  a  and  0 
but  the  other  U  values  do  not.  Such  a  subsequence  defines  a  gap  of  length  r.  If  n  of  the  N 
numbers  UQ,  U, ,  ...  ,  UN1  fall  into  the  range  a  <  Uj  <  0,  then  there  are  n  gaps  in  the 
cyclic  sequence.  Let 

Zr  =  number  of  gaps  of  length  r,  0  <  r  <  t 

Z{  =  number  of  gaps  of  length  t  or  greater 

p  =  probability  that  a  <  U .  <  0  . 

Then,  p  =  0  -  a.  Furthermore,  let 

pr  =  probability  of  observing  a  gap  of  length  r  (0  <  r  <  t) 

p(  =  probability  of  observing  a  gap  of  length  t  or  greater. 
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Then, 


Pr  = 


p(  l-p)r,  0  <  r  <  t  -  1 
( l-p)‘,  r  =  t 


It  can  be  shown  that  the  test  statistic 


x2  =£ 

r  =  O 


(Zr  -  npr)2 
npr 


is  approximately  distributed  as  a  chi-square  random  variable  with  t  degrees  of  freedom. 
In  program  RANDOM,  the  values  of  a,  j3  and  t  have  been  chosen  to  be 

a  =  0.30 

a  =  o.6o 

t  =  8 


The  critical  value  for  the  chi-square  test  at  the  five-percent  level  of  significance  is  then  found 
to  be  15.5073.  The  hypothesis  to  be  tested  is  that  the  observed  number  of  gaps  of  each  length 
is  not  significantly  different  from  the  number  expected  if  the  input  sequence  had  come  from  a 
U(0,1)  distribution.  If  the  computed  value  of  X2  equals  or  exceeds  the  above  critical  value, 
this  hypothesis  is  rejected. 


POKER  TEST 

Consider  subdividing  the  input  sequence  of  N  =  10,000  real  numbers  into  k  =  2000  groups 
of  five  successive  numbers,  (U5.,  U5j+, ,  ...  ,  U5j+4),  0  <  j  <  k.  Then,  convert  the  U  into  the 
integers  Y.  according  to  the  following  scheme: 


1 

if 

U. 

1 

G 

(0,  0.2] 

2 

if 

u. 

1 

e 

(0.2,  0.4] 

3 

if 

u. 

1 

G 

(0.4,  0.6] 

4 

if 

u. 

1 

G 

(0.6,  0.8] 

5 

if 

u. 

G 

(0.8,  1.0] 
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Each  quintuple  is  then  classified  into  one  of  five  disjoint  categories  based  on  the  number  of 
distinct  values  in  the  set  of  five  integers.  Each  quintuple  may  be  thought  of  as  representing 
a  “poker  hand”.  The  five  categories  are 

five  different  =  all  different 

four  different  =  one  pair 

three  different  =  two  pairs,  or  three  of  a  kind 

two  different  =  full  house,  or  four  of  a  kind 

one  different  =  five  of  a  kind. 

A  chi-square  test  based  on  the  observed  and  expected  number  of  quintuples  in  each 
category  can  now  be  used,  provided  that  an  expression  for  the  probability  that  a  quintuple 
contains  m  different  values  can  be  formulated.  Let  pm  represent  this  probability.  In  general, 
consider  k  groups  of  n  successive  integers  in  which  the  integers  may  range  from  I  to  d, 
inclusive.  The  probability  pm  can  be  formulated  as  the  ratio 

number  of  n-tuples  with  exactly  m  different  integers 
total  number  of  n-tuples 

from  m  =  1,  2,  ...  ,  d.  The  denominator  of  this  ratio  is  dn.  The  numerator  is  the  product  of 
the  number  of  ways  to  partition  a  set  of  n  elements  into  exactly  m  nonempty  disjoint  sub¬ 
sets,  denoted  by  S(nm)  ,  and  the  number  of  permutations  of  m  things  from  a  set  of  d  objects, 
namely  d  (d  -  1)  •  (d  -  m  +  1).  The  required  probability  then  becomes 

_  -  d  (d  ~  1)  •  •  '  (d-  m+1) 

Pm  d  n 

n 

The  notation  is  used  here  to  denote  Stirling  numbers  of  the  second  kind.  (Tables 

of  S^m)  may  be  found  in  Abramowitz  and  Stegun,  1964).  In  this  application,  d  =  5,  n  =  5,  and 
m  =  1,  2,  3,  4,  5.  The  required  Stirling  numbers  are 

S(5‘ )  =  1 

S(s2)  -  15 

S<3)  =  25 

S<4)  =  10 

^>=1. 
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Thus,  the  probabilities  that  a  quintuple  contains  m  =  1,  2,  3,  4,  5  different  values  are 


P« 

P2 

P3 

P4 

P5 

5 

Note  that  p  =  1,  as  required, 
m  =  1 

Letting 


0.0016 

0.0960 

0.4800 

0.3840 

0.0384 


Ej  =  expected  number  of  quintuples  in  category  i 

Cf  =  observed  number  of  quintuples  in  category  i 

for  i  =  1,2,  ...  ,  t,  the  test  statistic 


«v 


E«>2 


E. 


is  approximately  distributed  as  a  chi-square  random  variable  with  t  -  1  degrees  of  freedom. 
Since  there  are  k  =  2000  quintuples  in  our  application, 

E.  =  kp.  =  2000  p.,  i  =  1,  2,  3,  4,  5  . 

The  above  chi-square  test  statistic  is  employed  with  t  =  5  categories  and  t  -  1  =  4  degrees  of 
freedom.  The  critical  value  for  this  chi-square  test  at  the  five-percent  level  of  significance  is 
9.4877.  The  hypothesis  being  tested  is  that  the  number  of  poker  hands  of  each  type  does 
not  differ  significantly  from  the  expected  number  of  hands  of  each  type  obtained  when  the 
input  sequence  does,  in  fact,  come  from  a  U(0,1)  distribution.  If  X2  equals  or  exceeds  the 
above  critical  value,  this  hypothesis  is  rejected. 


COUPON  COLLECTOR’S  TEST 

This  test  is  related  to  the  poker  test  in  much  the  same  way  as  the  gap  test  is  related  to 
the  frequency  test.  The  input  sequence  of  N  =  10,000  real  numbers  UQ,  U](  ...  ,  U9999  is 
converted  into  the  sequence  of  integers  YQ,  Yj ,  ...  ,  Y?9?9  according  to  the  following  scheme: 
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•  -- 


if  u.  6  [0,  0.2] 


if  U(  €  (0.2,  0.4] 

if  U*  €  (0.4,  0.6] 

if  Uj  €  (0.6,  0.8] 

if  U4  £  (0.8,  1.0] 


These  integers  represent  different  “coupons”  to  be  collected.  A  “coupon  collector  sequence” 
is  a  subsequence  of  YQ,  Yj,  ...  ,  Y9999  of  the  shortest  length  required  to  contain  each  of  the 
integers  one  through  five  at  least  once.  Begin  by  observing  the  length  of  the  sequence  Y0, 
Yj,  ...  required  to  obtain  a  “complete  set”  of  the  integers  one  to  five.  If  this  length  is  denoted 
by  r,  then  the  first  coupon  collector  sequence  is  YQ,  Yj,  ...  ,  Yf_j.  This  process  is  then 
repeated  starting  with  Y  ,  Yf+1,  ...  to  obtain  additional  coupon  collector  sequences  until  the 
entire  input  sequence  is  exhausted. 


In  general,  consider  the  lengths  of  coupon  collector  sequences  that  contain  the  integers 
one  through  d.  Consider  compiling  the  frequencies  of  occurrence  of  the  lengths  of  these 
sequences.  Choose  an  integer  t  >  d  such  that  all  sequences  whose  lengths  are  greater  than  or 
equal  to  t  are  considered  to  be  in  the  same  category.  Then,  each  sequence  length  can  be 
classified  into  one  of  t  -  d  +  1  distinct  categories. 


A  chi-square  test  based  on  the  observed  and  expected  number  of  coupon  collector  seq¬ 
uences  in  each  category  can  now  be  developed.  Expressions  are  needed  for  (1)  pt,  the  probability 
that  a  sequence  containing  at  least  one  of  each  of  the  integers  1,  2,  ...  ,  d  is  of  length  r,  and 
(2)  pt,  the  probability  that  a  sequence  containing  at  least  one  of  each  of  the  integers  1,  2, 
...  ,  d  is  of  length  t  or  greater.  The  derivations  of  pr  and  pt  that  follow  reflect  the  fact  that 
the  shortest  sequence  that  contains  at  least  one  of  each  of  the  integers  one  through  d  in  each 
case  is  desired.  Following  the  development  in  the  poker  test,  the  expression 


represents  the  probability  that  an  r-tuple  contains  exactly  d  different  values,  where 
d!  =  d(d  -  l)(d  -  2)  *  *  •  1  and  Sr(d)  denotes  a  Stirling  number  of  the  second  kind,  as  before. 
Hence, 


represents  the  probability  that  an  r-tuple  is  “incomplete”  (i.e.,  it  does  not  contain  all  d  dif¬ 
ferent  integers).  It  is  clear,  then,  that 


Pt  =  q 


t-i 


=  1 


d! 

d‘-> 


11 


Also,  for  d  <  r  <  t, 


d| 

dr 


[Sjd)  -  dSj^l  • 


From  an  addition  identity  involving  Stirling  numbers  of  the  second  kind, 

£(d-l)  =  5(d)  _  d  5(d) 

Hence,  the  required  probability  can  be  expressed  as 

pr  =  ^f5<r-"l1),  d<r<t  • 

In  programming  this  test  for  inclusion  in  RANDOM,  d  =  5  and  t  =  15  were  chosen.  These 
choices  give  rise  to  the  following  probabilities: 


Ps 

= 

0.038400000 

P6 

= 

0.076800000 

P7 

= 

0.099840000 

Ps 

= 

0.107520000 

P9 

= 

0.104509440 

P10 

= 

0.095477760 

Pll 

= 

0.083816448 

Pi  2 

= 

0.071639040 

Pi  3 

= 

0.060112994 

Pi  4 

= 

0.049791565 

Pi  5 

= 

0.212092753 

is 

Note  that  J]  pr  =  1  as  required. 

r=  5 

Let  n  represent  the  total  number  of  coupon  collector  sequences  observed  in  the  sequence 
Y0,  Yj,  ...  ,  Y9999.  Further,  let 
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and 


Zr  =  observed  number  of  coupon  collector 
sequences  of  length  r,  d  <  r  <  t 


Then,  the  test  statistic 


observed  number  of  coupon  collector 
sequences  of  length  t  or  greater. 


x2  -  £ 


t-  d 


npr 


is  approximately  distributed  as  a  chi-square  random  variable  with  t  -  d  degrees  of  freedom. 
This  test  statistic  is  applied  here  with  t  —  d  +  1  =  11  categories  and  t  -  d  =  10  degrees  of 
freedom.  The  critical  value  for  this  chi-square  test  at  the  five-percent  level  of  significance  is 
18.3070.  The  hypothesis  to  be  tested  is  that  the  observed  number  of  coupon  collector 
sequences  of  each  length  does  not  differ  significantly  from  the  expected  number  obtained 
when  the  input  sequence  does,  in  fact,  come  from  a  U(0,1)  distribution.  Hence,  if  the  com¬ 
puted  value  of  X2  equals  or  exceeds  the  above  critical  value,  this  hypothesis  is  rejected. 

It  is  of  interest  to  note  that  when  it  is  assumed  that  the  input  sequence  is  random,  the 
number  of  successive  Y/s  that  need  to_  be  examined,  on  the  average,  before  a  complete  set  of 
“coupons”  has  been  found  is  11.4166.  Hence,  with  N  =  10,000  numbers,  one  would  expect 
to  observe  876  coupon  collector  sequences  on  the  average  if  one  applied  the  coupon  collector’s 
test  repeatedly  a  large  number  of  times,  each  time  using  a  different  input  sequence  of 
N  =  10,000  real  numbers  UQ,  Uj ,  ...  ,  U9999. 


PERMUTATION  TEST 

In  this  test,  the  input  sequence  of  real  numbers  is  subdivided  into  k  groups  of  n  elements 
each,  i.e.,  (Ujn,  U.n+1,  ...  ,  U.+n_j),  0  <  j  <  k.  Assuming  that  equality  between  U’s  does 
not  occur,  the  elements  in  each  group  can  have  n!  possible  relative  orderings  or  permutations. 
Since  the  probability  of  occurrence  of  each  of  these  orderings  is  p.  =  1/n!,  i  =  1,  2,  ...  ,  n!, 
a  chi-square  test  with  t  =  n!  categories  can  be  applied  to  the  observed  and  expected  number  of 
n-tuples  of  each  type. 

In  programming  this  test  for  inclusion  in  RANDOM,  n  =  3  was  chosen.  Thus,  k  =  3333 
triples  comprising  the  first  9999  numbers  in  the  input  sequence  were  observed.  Let  A,  B,  and  C 
represent  the  elements  in  a  triple  where  A  is  the  smallest  value,  B  the  middle  value,  and  C  the 
largest  value.  Then,  the  3!  =  6  possible  orderings  are  (A,  B,  C),  (A,  C,  B),  (B,  A,  C),  (B,  C,  A), 
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(C,  A,  B),  and  (C,  B,  A).  An  algorithm  was  used  to  count  the  number  of  times  each  of  these 
triple  types  was  observed  in  the  input  sequence.  These  observed  frequencies  were  then  used  in 
evaluating  the  chi-square  test  statistic  with  t  =  6  categories. 

If 


Ej  =  expected  number  of  triples  in  category  i 

Oj  =  observed  number  of  triples  in  category  i 

for  i  =  1 ,  2,  ...  ,  t,  then  the  test  statistic 


x2=E 


(Oj  “ 


Et>2 


E; 


is  approximately  distributed  as  a  chi-square  random  variable  with  t  - 
Note  that  in  this  application. 


« 

1  degrees  of  freedom. 


E  =  kpj  =  (3333)0/6)  =  555.50 

for  all  i.  The  critical  value  for  the  above  chi-square  test  with  t  -  1  =  5  degrees  of  freedom  at 
the  five-percent  level  of  significance  is  11.0705.  The  hypothesis  under  consideration  is  that  the 
observed  number  of  permutations  of  each  type  is  not  significantly  different  from  the  expected 
number  obtained  when,  in  fact,  the  input  sequence  comes  from  a  U(0,1)  distribution.  The 
hypothesis  is  rejected  if  the  computed  value  of  X2  equals  or  exceeds  the  above  critical  value. 


RUNS  TEST 

A  method  for  testing  the  input  sequence  of  real  numbers  UQ,  U, ,  ...  ,  U9999  for  “runs 
up”  and  “runs  down”  is  presented.  A  run  is  defined  as  an  unbroken  sequence  of  observations 
in  which  all  of  the  numbers  are  either  increasing  or  decreasing.  Consider  the  subsequence 

Ui>Ui+.  <Ui+2<-<U*p>Ui+p+«  • 

The  numbers  Ui+1  through  Uj+p  define  a  “run  up”  of  length  p.  Similarly,  the  numbers  Uj+J 
through  Uj+p  would  define  a  “run  down”  of  length  p  if  the  direction  of  each  of  the  inequality 
signs  above  were  reversed.  If  the  input  sequence  is  representative  of  a  sequence  of  random 
numbers  drawn  from  a  U(0,1)  distribution,  then  neither  the  total  number  of  runs  up  nor  the 
total  number  of  runs  down  should  be  excessively  high  or  excessively  low.  Moreover,  the 
frequency  with  which  various  lengths  of  runs  up  and  runs  down  occur  requires  careful 
examination.  The  observed  numbers  of  runs  up  and  runs  down  of  a  particular  length  should  not 
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differ  greatly  from  their  corresponding  expected  numbers.  In  the  ensuing  paragraphs,  tests  on 
both  the  number  of  runs  and  the  lengths  of  runs  will  be  discussed. 

Define 


Rp  =  number  of  runs  of  length  >  p 


and 


Rp  =  number  of  runs  of  length  p  exactly 


Expressions  for  the  means  of  R  and  R'  and  the  covariances  between  Rp  and  Rq,  Rp  and 
Rq,  and  Rp  and  Rq  are  given  in  Knuth  (1969).  These  expressions  are  functions  of  the  length  of 
the  input  sequence,  N.  The  covariance  between  Rp  and  Rq  measures  the  interdependence 
between  these  two  variables.  If  p  -  q,  the  covariance  between  Rp  and  Rq  is  equivalent  to  the 

variance  of  Rp.  Wolfowitz  (1944)  has  shown  that  Rp  R2 . Rt-i»  Rt  become  normally 

distributed  as  N  -*•  °°,  with  means  and  covariances  given  by  the  aforementioned  expressions. 
These  results  are  sufficient  to  permit  the  development  of  tests  on  both  the  number  and  lengths 
of  runs. 


Let 


R  =  total  number  of  runs 

-  xi  r.+r;  • 

i»l 

Then,  the  random  variable 


_  R  -  E(R) 
R  [VaifR)]* 


has  a  standard  normal  distribution  (i.e.,  mean  zero  and  variance  one).  Here,  E(R)  and  Var(R) 
denote  the  mean  and  variance  of  the  random  variable  R,  respectively. 

The  hypothesis  to  be  tested  is  that  the  observed  number  of  runs  up  is  not  significantly 
different  from  the  expected  number  of  runs  up  obtained  when  the  input  sequence  comes  from 
a  U(0,1)  distribution.  Hence,  to  perform  a  test  at  the  five-percent  level  of  significance  on  the 
total  number  of  runs  up,  compute  ZR  and  compare  it  to  the  interval  (-1.960,  1.960).  If  ZR 
falls  outside  this  interval,  the  hypothesis  is  rejected.  This  same  test  is  applied  to  the  observed 
number  of  runs  down. 
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An  algorithm  that  counts  both  runs  up  and  runs  down  is  employed  in  RANDOM.  The 
value  of  t  was  chosen  to  be  six  in  this  application.  With  N  =  10,000,  E(R)  =  5000.50  and 
Var(R)  =  833.4166  for  both  runs  up  and  runs  down. 

In  developing  a  test  on  the  lengths  of  runs  up  (or  runs  down),  we  note  that  the  usual 
chi-square  test  is  not  applicable,  since  adjacent  runs  are  not  independent.  The  following  pro¬ 
cedure  circumvents  this  difficulty.  Let 

(  -  E(R.),  i  =  1,2,  ...  ,  t  -  1 

Qi=  \ 

i  r;  -  e(r;),  i  =  t 

Let  C  =  ( Cjj )  denote  the  tx  t  matrix  of  covariances  between  the  random  variables  R[ ,  R2,  ...  , 
Rr_, »  and  Rj;  i.e.,  cJ4  is  the  covariance  between  Rj  and  R4,  while  c1(  is  the  covariance 
between  Rj  and  Rj.  Now,  let  the  tx  t  matrix  A  =  (a(.)  denote  the  inverse  of  C.  Then,  the 
test  statistic 

x2  =  £  Qi  Qj  ay 

1  <  i.  j<  t 

is  distributed  as  a  chi-square  random  variable  with  t  degrees  of  freedom  when  N  is  large.  Again, 
t  =  6  was  chosen  in  this  application.  Hence,  the  number  of  runs  up  and  runs  down  of  lengths 
one  through  five  and  six  or  longer  are  counted  in  program  RANDOM  and  the  same  test  is 
applied  to  both  runs  up  and  runs  down.  The  critical  value  for  this  chi-square  test  with  six 
degrees  of  freedom  at  the  five-percent  level  of  significance  is  12.5916.  The  hypothesis  to  be 
tested  is  that  the  observed  number  of  runs  up  (down)  of  each  length  is  not  significantly  dif¬ 
ferent  from  the  expected  number  of  runs  up  (down)  observed  when  the  input  sequence  comes 
from  the  U(0,1)  distribution.  If  the  computed  value  of  X2  equals  or  exceeds  the  above  critical 
value,  this  hypothesis  is  rejected. 

It  is  interesting  to  note  that  both  the  permutation  and  runs  tests  do  not  depend  on  the 
U’s  being  uniformly  distributed;  they  require  only  that  the  probability  that  II  =  U  is  zero 
for  i  #  j.  Hence,  these  tests  can  be  applied  to  pseudo-random  sequences  other  than  those 
generated  from  a  U(0,1)  distribution. 


SERIAL  TEST  FOR  SUCCESSIVE  PAIRS 

This  test  is  designed  to  determine  whether  or  not  successive  pairs  of  numbers  are  uniformly 
and  independently  distributed.  Begin  by  converting  the  input  sequence  of  N  =  10,000  real 

numbers  UQ,  U,,  ...  ,  U,999  into  a  sequence  of  integers  YQ,  Y, . Y9999  by  multiplying 

each  Uj  by  10  and  truncating  the  decimal.  Hence,  0  <  Y.  <  9  for  all  i.  Then,  subdivide  the 
sequence  YQ ,  Yj ,  ...  ,  Y9999  into  5000  pairs  of  two  successive  integers  and  count  the 
frequency  with  which  the  pair  (Y2j,  Y2j+ , )  =  (q,  r)  occurs  for  0  <  j  <  5  000  where 


0  <  q,  r  <  10.  These  5000  observed  pairs  of  integers  are  then  used  to  fill  a  10  x  10  frequency 
table  in  which  the  first  integer  of  the  pair  determines  the  row  number  and  the  second  integer 
determines  the  column  number  of  the  cell  to  which  that  pair  belongs.  Since  the  probability 
that  a  randomly  selected  integer  pair  will  belong  to  a  particular  cell  is  0.01  for  all  cells,  a  chi- 
square  test  based  on  the  observed  and  expected  number  pf  pairs  in  each  of  the  100  categories 
can  be  used. 

Let 


expected  number  of  times  the  integer  pair 
(i,  j)  occurred 


Otj  =  observed  number  of  times  the  integer  pair 


(i,  j)  occurred 
for  0  <  i,  j  <  9.  Then  the  test  statistic 


*  9  (O,.  -  E,.)2 

X2-£  E  -u— ! 1 


i*0  j=0 


E;: 


is  approximately  distributed  as  a  chi-square  random  variable  with  99  degrees  of  freedom. 
Since  5000  integer  pairs  are  to  be  assigned  to  100  categories,  Ey  =  50  for  all  i  and  j.  The 
number  of  degrees  of  freedom  used  in  this  test  is  justified  by  the  fact  that  the  Ey  are  not 
estimated  from  the  observed  frequencies  (Freund,  1962).  The  critical  value  for  this  chi-square 
test  at  the  five-percent  level  of  significance  is  123.2253.  The  hypothesis  to  be  tested  is  that 
the  observed  distribution  of  successive  pairs  of  numbers  does  not  differ  significantly  from  the 
expected  distribution  for  an  input  sequence  of  random  variates  from  a  U(0,1)  distribution. 
Hence,  if  the  computed  value  of  X2  equals  or  exceeds  the  above  critical  values,  this  hypothesis 
is  rejected. 


SERIAL  CORRELATION  TEST 

Consider  the  input  sequence  of  real  numbers  U0,  U, ,  ...  ,  U9999.  To  determine  if  the 
values  in  this  sequence  are  related  in  any  special  way  to  the  values  h  units  apart  for  some  h, 
the  test  for  serial  correlation,  which  computes  a  measure  of  the  amount  that  Ui+h  depends 
on  Uj,  is  used.  This  measure  of  dependency  is  called  the  serial  correlation  of  lag  h  and  represents 
the  correlation  between  pairs  of  equally  spaced  observations  from  a  sample.  Let  N  represent 
the  number  of  elements  in  the  sample  to  be  considered.  Serial  correlation  can  be  defined  in 
one  of  two  ways  (Bennett  and  Franklin,  1954): 

1.  In  the  circular  definition,  Uj+h  =  Uj+h_N  is  defined  for  i  +  h  >  N.  Circular  serial 
correlation  is  useful  in  detecting  periodic  effects  in  a  sequence  of  observations. 
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2.  In  the  noncircular  definition,  the  pairs  that  depend  upon  use  of  the  definition  Uj+h 
=  Uj+h_N  for  i  +  h  >  N  are  omitted  and  the  serial  correlation  between  the  remaining  pairs 
is  computed.  Noncircular  serial  correlation  is  useful  in  detecting  trends  in  a  series  of  observa¬ 
tions. 

The  test  for  serial  correlation  employed  in  RANDOM  is  taken  from  Wald  and  Wolfowitz 
(1943)  and  is  performed  using  both  the  circular  and  noncircular  definitions.  The  theory  behind 
this  test  requires  that  N  be  a  prime  number;  hence,  N  was  chosen  to  be  9973,  the  largest  prime 
number  less  than  or  equal  to  10,000.  Serial  correlations  are  usually  computed  only  for  small 
values  of  h  in  practice,  since  most  of  the  important  dependencies  occur  at  those  values.  For  this 
reason,  only  lags  one  through  10  were  considered  in  RANDOM.  Thus,  the  first  9973  numbers 
in  the  input  sequence  were  used  to  compute  the  statistic 


r;  -  f  ui  ui+h. 


h=  ),  2,  ...  ,  10 


where  for  i  +  h  >  9973,  Ui+h  =  Ui+h_9973  and  i  =  0,  1,  2,  ...  ,  9972 
(circular  definition) 

i-  0,  1,  2, ...  ,  9972  -  h  . 

(noncircular  definition) 

The  mean  and  variance  of  the  random  variable  are  given  by 

(Sj  -  S2) 

i  \  -  _ 1 _ _ 


E(RM  = 


n  -  1 


and 


Var(R! 


S2  -  s4  S*  -  4Sj S2  +  4S,S3  +  S2  -  2S4  (S2  -  S2) 

)  _  +  _  —  -  -  — r 


n  -  1 


(n  -  l)(n  -  2) 


(n-  l)2 


9972 


where  Sk  =  ^  11*  is  the  kth  power  sum  of  the  observations.  It  can  be  shown  that  R^ 

i“0 

approaches  the  normal  distribution  for  large  N.  Hence,  the  random  variable 

r;  -  e<r;> 

h  "  [VarfR;)J* 


has  a  standard  normal  distribution  (i.e.,  mean  zero  and  variance  one).  To  perform  the  tests 
for  serial  correlation,  compute  Zh  for  all  10  values  of  h  and  for  both  the  circular  and  non¬ 
circular  forms.  For  tests  at  the  five-percent  level  of  significance,  compare  each  computed  value 
of  Zh  to  the  interval  (-1.960,  1.960).  The  hypothesis  under  consideration  is  that  the  serial 
correlation  between  observations  seperated  by  lag  h  is  not  significantly  different  from  the 
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correlation  for  an  input  sequence  of  random  variates  from  the  U(0,1)  distribution.  If  any  Zh 
falls  outside  the  above  interval,  the  corresponding  hypothesis  is  rejected. 

It  should  be  noted  that  the  above  test  does  not  depend  on  the  IPs  being  uniformly  dis¬ 
tributed.  The  test  is  nonparametric  in  the  sense  that  it  assumes  only  that  the  IPs  represent  a 
random  sample  from  a  distribution  with  continuous  cumulative  distribution  function. 


EVALUATING  THE  CANDIDATE  RANDOM  NUMBER  GENERATOR 


The  previous  section  described  11  statistical  tests  of  hypothesis  designed  to  detect  de¬ 
partures  from  randomness  for  the  pseudo-uniform  random  number  generator  under  consider¬ 
ation.  This  section  will  present  some  guidelines  for  interpreting  the  results  of  these  tests;  the 
case  in  which  the  candidate  generator  fails  one  or  more  tests  is  included. 

Before  proceeding  with  a  discussion  of  the  interpretation  of  test  results,  it  would  be 
beneficial  to  reconsider  the  testing  process  itself.  In  testing  for  randomness,  one  looks  for 
behavior  that  is  not  actually  present  in  one’s  sequence,  since  the  process  of  pseudo-random 
number  generation  is  deterministic;  yet,  we  require  that  the  numbers  so  produced  exhibit  the 
semblance  of  randomness  (Overstreet,  1972).  Thus,  some  degree  of  subjective  judgment  should 
accompany  the  evaluation  of  test  results  in  order  to  reach  a  final  decision  to  accept  or  reject 
the  candidate  generator. 

A  statistical  test  of  an  hypothesis  that  leads  to  its  rejection  is  said  to  be  “significant”; 
otherwise,  it  is  “nonsignificant.”  Recall  that  each  of  the  statistical  tests  of  randomness  is 
performed  at  the  five-percent  level  of  significance.  For  an  individual  test,  this  means  that  the 
probability  is  0.05  that  the  test  will  incorrectly  conclude  that  the  candidate  generator  is  not 
sufficiently  random  to  be  useful,  when,  in  fact,  the  generator  does  produce  sequences  of 
numbers  that  exhibit  the  hypothesized  characteristics  of  U(0,1)  variates.  Now,  suppose  that  the 
“good”  candidate  generator  was  used  to  generate  a  large  number  of  sequences  of  length  N  such 
that  each  sequence  was  obtained  through  the  use  of  a  different  starting  “seed”  value.  If  each  of 
these  sequences  were  then  subjected  to  the  same  statistical  test  of  randomness,  the  test  pro¬ 
cedure  would  incorrectly  conclude  that  approximately  five-percent  of  these  sequences  were 
products  of  “bad”  generators.  In  other  words,  even  a  “good”  generator  will  fail  any  of  these 
tests  five  percent  of  the  time.  It  is  in  this  sense  that  the  rejection  of  the  hypothesis  of  random¬ 
ness  is  interpreted  at  the  five-percent  level  of  significance. 

The  collective  interpretation  of  a  set  of  tests  for  randomness  is  difficult.  The  need  for 
making  several  tests  is  clear.  Even  “bad”  generators  will  pass  some  of  these  tests,  while  failing 
others;  hence,  subjecting  a  “bad”  generator  to  only  a  few  tests  may  be  insufficient  to  identify 
it  as  “bad.”  On  the  other  hand,  the  use  of  numerous  tests  is  not  totally  free  from  criticism. 


since  some  of  these  tests  are  dependent,  although  the  degree  of  dependence  is  either  unknown 
or  extremely  difficult  to  assess.  Hence,  when  interpreting  the  results  of  a  series  of  tests,  the 
significance  levels  should  be  viewed  as  a  general  indication  rather  than  as  a  specific  prediction. 

Any  statistical  hypothesis  testing  procedure  can  result  in  one  of  two  decision  errors: 
rejection  of  a  correct  hypothesis,  or  acceptance  of  a  false  hypothesis.  The  size  of  each  of 
these  errors  is  measured  in  terms  of  its  probability  of  occurrence.  In  this  report,  the  size  of 
the  first  error  is  controlled  at  0.05,  but  no  attempt  has  been  made  to  assess  the  probability 
of  accepting  a  false  hypothesis.  However,  as  N  increases,  this  latter  probability  decreases. 
Hence,  the  large  N  value  used  in  these  tests  ensures  that  the  probability  of  accepting  a  false 
hypothesis  will  be  reasonably  small.  Since  a  statistical  test  of  hypothesis  is  not  significant  unless 
it  results  in  a  rejection,  a  test  that  does  not  detect  lack  of  randomness  does  not  imply  that  the 
sequence  being  tested  is  random.  For  this  reason,  the  phrase  “fail  to  reject  the  hypothesis” 
rather  than  “accept  the  hypothesis”  is  used  when  stating  the  conclusions.  The  advantage,  then, 
of  performing  several  tests  of  randomness  on  a  sequence  is  that  the  feeling  of  confidence  in 
using  the  proposed  generator  is  increased  if  a  relatively  high  number  of  nondetections  is 
obtained.  A  reliable  generator  (i.e.,  one  that  produces  numbers  that  are  sufficiently  random)  is 
one  that  performs  well  when  subjected  to  extensive  testing. 

One  final  comment  regarding  the  testing  of  pseudo-uniform  random  number  generators 
is  in  order.  A  candidate  generator  should  not  necessarily  be  discarded  just  because  it  fails  one 
or  two  of  the  tests  for  randomness,  since  statistical  testing  permits  failures  for  a  good  generator 
a  small  proportion  of  the  time.  If  a  failure  is  observed  for  a  certain  test,  this  test  should  be 
closely  examined  to  see  if  the  reasons  for  failure  can  be  ascertained.  Was  the  decision  to  reject 
the  hypothesis  a  borderline  one,  or  was  the  test  highly  significant?  Was  the  failure  merely  a 
result  of  random  variation,  or  was  it  the  result  of  a  serious  deficiency  of  the  generating  scheme 
itself?  It  is  recommended  that  program  RANDOM  be  rerun  one  or  more  times  using  different 
input  sequences,  each  generated  by  a  different  starting  seed  value,  in  the  hopes  of  shedding 
some  light  on  these  questions. 

Thus,  no  firm  quantitative  guidelines  currently  exist  for  deciding  whether  or  not  to  accept 
a  candidate  generator.  We  note,  however,  that  if  we  assume  that  the  1 1  tests  in  RANDOM  are 
all  independent,  then  the  probability  of  obtaining  one  or  more  rejections  in  the  11  tests  is 
about  0.43  if,  in  fact,  all  of  the  hypotheses  are  true!  This  observation  can  serve  as  a  general 
guideline  when  deciding  whether  to  accept  or  reject  a  candidate  generator.  The  decision  is 
still  largely  subjective  and  should  be  made  only  after  careful  examination  of  the  test  results. 
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APPENDIX  A 
INPUT  GUIDE 


A- 1 


The  10,000  data  points  used  as  input  may  be  read  in  from  punched  cards  or  a  permanent 
file.  The  following  data  card  is  required  for  either  option: 


Columns 


1-5 


11-80 


CARD  TYPE  1 

Variable  Description  Format 


10P  10P  =  1 :  data  follows  15 

on  punched  cards 
I0P  #  1 :  data  attached 
via  TAPE  5  with 
input  format  (E22. 14) 

F0RM  Format  of  input  data  7A10 

cards  (used  only 
if  I0P  =  1) 


DECK  SET  UP 

ATTACH,  LG0,  RAND0M,  ID  =  N1W. 

ATTACH,  SYSLIB. 

LIBRARY,  SYSLIB. 

ATTACH,  TAPE5,  datafile.  [used  if  10P  *  1  ] 

LG0. 

7/8/9 

Card  Type  1 


data  cards  if  I0P  =  1 


6/7/8/9 


A-3 


APPENDIX  B 
SAMPLE  OUTPUT 


The  U(0,1)  random  number  generator  currently  in  use  on  the  CDC  6700  computer  system 
at  NSWC  is  called  RANF.  This  generator  has  been  widely  used  in  simulation  and  analysis 
studies  at  NSWC.  Ten  thousand  U(0,I)  variates  were  generated  from  RANF  using  the  starting 
floating  point  seed  value  of  3571.0  and  stored  on  a  permanent  file.  These  numbers  were  then 
processed  by  program  RANDOM  to  illustrate  the  program’s  output.  The  sample  printout  for 
this  case  is  shown  in  this  appendix. 
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